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Abstract 

The value of a continuous character evolving on a phylogenetic tree is commonly mod- 
elled as the location of a particle moving under one-dimcnsional Brownian motion with 
constant rate. The Brownian motion model is best suited to characters evolving under neu- 
tral drift or tracking an optimum that drifts neutrally. We present a generalization of the 
Brownian motion model which relaxes assumptions of neutrality and gradualism by consid- 
ering increments to evolving characters to be drawn from a heavy-tailed stable distribution 
(of which the normal distribution is a specialized form) . We describe Markov chain Monte 
Carlo methods for fitting the model to biological data paying special attention to ancestral 
state reconstruction, and study the performance of the model in comparison with a selection 
of existing comparative methods, using both simulated data and a database of body mass 
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in 1,679 mammalian species. We discuss hypothesis testing and model selection. The new 
model is well suited to a stochastic process with a volatile rate of change in which biological 
characters undergo a mixture of neutral drift and occasional evolutionary events of large 
magnitude. 

Statistical methods that take into account the dependencies introduced into comparative data 
by phylogenetic relatedness are fundamental to hypothesis testing and exploration in comparative 



biology (Harvey and Pagel 1991 Martins 1996). Each comparative method implicitly imputes 



to the evolutionary process some specific stochastic model (Pagel and Harvey 19891. The va- 



lidity of phylogenetic comparative methods depends on the degree to which historical events 
can be accommodated by the underlying stochastic model of trait evolution, and a mismatch 
between model and reality can yield erroneous statistical results, especially in the reconstruc- 



tion of inaccurate ancestral character states ( Cunningham 1999 Oakley and Cunningham 2000 



Finarelli and Flynn 2006 Slater et al. 2012 ). For this reason it is important to develop realistic 



stochastic models of character evolution (and to constrain those models using empirical data 
where possible). 

The Brownian motion model of evolution - in which the value of a continuous trait evolves 
by accruing incremental changes drawn from a random distribution with zero mean and finite 
constant variance, such that the sum of many increments is distributed according to a normal 



density (Harvey and Pagel 19911 - was introduced to model changes in gene frequencies by 



Cavalli-Sforza and Edwards ( 1967 1 but now underlies (directly or indirectly) a range of popular 
methods for the analysis of continuous traits distributed over phylogenetic trees. In the realm 
of ancestral state reconstruction the value of a trait evolving across a phylogenetic tree can 
at all times be shown to be probabilistically distributed according to a normal distribution 



with variance depending only on tree topology and branch lengths (Maddison 19911 McArdle 



and Rodrigo 1994 Schluter et al. 1997), while in the realm of regression analysis the model 



yields normally distributed residuals and a covariance matrix depending only on tree topology 



and branch lengths ( Grafen[ 1989), both cases giving rise to simple and analytically-tractable 
solutions. The Brownian walk of a trait value can be regarded as a model of gradualistic neutral 
evolution since variation in the trait arises from a process of random drift over the branches of 
a phylogeny at a constant rate and without directionality. A further application thus involves 
identifying significant departures from the expectation of the Brownian motion model as a means 



of detecting adaptive variation in the tempo and mode of trait evolution ( Harmon et al. 2003 



2010| [CTMeara et al.[ [2006| ). 

We here describe a simple generalization of the Brownian motion model of continuous charac- 
ter evolution which extends the model to include cases where increments to an evolving character 
may arise from a symmetric stochastic process but without assuming constant finite variance. 
Not only is the Brownian assumption of constant finite variance, to the best of our knowledge, 
unverified in existing biological systems, but its relaxation may better suit the domain of applica- 
tion of the standard Brownian motion model, namely biological characters likely subject to some 
degree of selection, and in many cases it may offer a more robust form of statistical inference 
with respect to outliers in the data. Broadly speaking, we conceptualize diversifying selection on 
continuous characters as causing an increase and purifying selection on such characters a decrease 
in the rate of evolutionary change. The relative frequencies of these forms of natural selection 
along with neutral drift are expected to generate — for sums of evolutionary increments over 
long periods of time — limit distributions with heavier tails than expected under the Brownian 
motion model. 

Just as the limit distribution of sums of variates drawn from a distribution with constant 
finite variance is the normal distribution, so the limit distribution of sums of variates drawn 



from a distribution without fixed finite variance is the stable distribution ( Gnedenko and Kol- 



mogorov[ 1949/54} [Samorodnitsky and Taqqu 1994 Uchaikin and Zolotarev 1999), which has 



the normal distribution as a special case and which otherwise is characterized by heavy tails, 



closure under convolution and, potentially, by skew (Nolan 1997). Stochastic processes and 



random walks with volatile variance and heavy tails have previously been modelled robustly 



using stable distributions in areas as diverse as the study of fractional diffusion ( Gorenflo et al. 



2002), earthquake forecasting (Lavallee 20031, signal processing (Nikias 1995), animal foraging 



(Viswanathan et al. 1996), rainfall modelling (Menabde and Sivapalan 2000), commodity pric- 



ing (Weron 20061, real estate markets (Young 2008), foreign exchange rates (Fofack and Nolan 



2001), financial statistics (Rachev 2003), image processing (Arce 2005) and telecommunications 



management (Mikosch et al. 2002). Landis et al. (2013) recently used a Brownian motion model 



with Levy stable jumps to model jumps in the evolution of continuous traits. According to our 
view of selection resulting in the generation of evolutionary rate volatility we limit our attention 
in this paper to the symmetric zero-centred stable distribution parameterized by a, the index 
of stability and c, the scale. We model evolution using the stable generalization of Brownian 



motion, the stable random walk ( Samorodnitsky and Taqqu 1994). The stable random walk 



shares some attractive properties of Brownian motion, the most important being closure under 
convolution, such that the sum of several stable distributions is itself a stable distribution with 
the same a parameter. This closure under linear transformation contrasts with all other non- 
stable heavy tailed distributions, which may yield complex and analytically intractable mixtures 
in convolution. Thus, after accumulating increments from a symmetrical zero-centred stable 
distribution, the value of an evolving trait is always probabilistically distributed according to 
a stable distribution with mean equal to the trait value prior to accumulation and with scale 
proportional to the number of increments, or in phylognetic parlance the branch length. 

Figure 1 illustrates the log probability densities of some unit-scale zero-centred symmetri- 
cal stable distributions that differ in the value of a, including the special cases of the normal 
distribution (a — 2) and the Cauchy distribution (a ~ 1). Figure 2 illustrates some stable ran- 



dom walks driven by accumulation of increments from stable noise with various a values. It is 
clear from inspection of these figures that declining a is associated with increasingly heavy tails, 
translating into increasingly volatile random walks which exhibit Brownian-like drift (associated 
with increments drawn from the high-probability modal region of the underlying distribution) 
interspersed with occasional rapid jumps in trait value (associated with increments drawn from 
the low-probability heavy tails). 

As a motivating example of how the accommodation of evolutionary rate volatility may affect 
the inference of evolutionary processes, we present a simple toy model of continuous character 
evolution on a phylogenetic tree with six tips. The values of an evolving character were simulated 
under the Brownian motion model and values at internal nodes were reconstructed based on 
values at the tips only, first by fitting a Brownian motion model and second by fitting a stable 
model using the method presented later in this paper. Next, the value of the character on a 
single tip was artificially inflated by a factor of ten, to represent a bout of adaptive evolution 
(or other event such as measurement error) on the branch leading to that tip. Again, values at 
internal nodes were reconstructed under both models. Results are illustrated in Figure 3. Both 
reconstruction methods yield similar ancestral states for the original Brownian motion data, but 
differ for the manipulated data. The Brownian motion model exhibits an "averaging eff'ect" 
in which the apparently high rate of evolution resulting from the manipulation is distributed 
somewhat evenly over all the branches, causing a large increase in estimated ancestral states 
for several internal nodes. The stable model, however, can entertain rare increments of large 
magnitude, and so is not strongly aff'ected by the manipulation; the apparently high rate of 
evolution on a single branch can be accommodated as a rare event in a heavy-tailed stochastic 
process. 

In this paper we describe the stable model and the procedures used to fit it to empirical 

data, with an emphasis on ancestral state reconstruction. We outline strategies for hypothesis 
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testing and model selection. We apply the stable model to simulated data in order to estimate 
error rates of the model selection procedure, and also use it, alongside a number of alternative 
models, in an illustrative case study of ancestral state reconstruction based on a a large dataset 
of mammalian body masses. Finally we discuss the relationship between the stable model of 
trait evolution and a number of alternative extensions of the Brownian motion model that have 
been proposed in the literature, and suggest avenues for further development. 

Methods 
Model 

Consider a rooted phylogenetic tree T which may or may not contain polytomies. A continu- 
ous character X evolves along the branches of T, taking values bi and 62 at the beginning and 
end, respectively, of each branch b. Under the standard Brownian motion model of evolution, 
the continuous character evolves by accumulating random independent increments drawn from 
a probability distribution with constant mean zero and constant finite variance a^. Accord- 
ing to the central limit theorem, the sum of such increments along a branch b of length tb is 
probabilistically distributed according to a normal density with mean zero and variance ffccr^, a 
density which we denote ^{b2 — bi;tb<T^). Given the independence of increments, and therefore of 
branches, the likelihood of an ancestral state reconstruction of a continuous character evolving 
under Brownian motion is given by the product: 

Lpii,a;T) = l[cj>{b2-bv,tn(T^) (1) 
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If the variance of the increment generating distribution is not constant and finite (as we 
suppose to be the case under departures from neutrality and gradualism) then according to 
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the generalized central limit theorem the limit distribution for the sum of random independent 
variates is not normal but falls into the more general class of stable distributions, parameterized 
by an index of stability a and a scale c. The symmetric stable distribution has probability density 



denoted S(a;;a,c). Following Matsui and Takemura (2004), the unit stable density with c = 1 
may be defined as: 



S(a;;a, 1) = — — / G(k; a, x) exp(— G(k; a, a;))d«; (2) 

7r|a - \\x Jo 



where 



^, . /a::cosK\ cos(a — ilK 

G{K;a,x) = [- , (3) 

V sm aK / cos k 

and the general symmetrical stable density S(a;; a, c) is a transformation of the unit stable density 
S(f;a,l) /c. 

One important property of the stable distribution is that the normal distribution is a special 
case with a — 2. For the zero-centred symmetrical cases treated here, we note that: 



{x;a')^S(x;2,^] (4) 



Furthermore, the sum of t variates drawn from a stable distribution S{a,c) is distributed as 
S (^a,{tc°')^^. Thus, under a stable model of evolution, the likelihood of an ancestral state 
reconstruction of X is given by: 

L{X,a,c;T) =Y[S (b2 - h;a,{hc'')i) (5) 

6 

which is functionally identical to Equation ([T]) when a = 2. 

Unfortunately, there is no analytical solution to the stable probability density function in 
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Equation ([2|, so it is necessary to employ numerical methods (Brorsen and Yang 1990 Nolan 



1997 


McCuUoch 


1998b 


Nolan 


2001 



stable models. The model does not lend itself to direct maximum likelihood estimation of pa- 
rameters due to the existence of a highly multimodal likelihood surface and because arbitarily 
high likelihoods can be obtained by setting 61 — 62 for any single internal branch 6 and having 



0, a problem exhibited by other statistical models with non-constant variance ( Ciuperca 



et al. 2003 ) , and circumvented through the placing of an appropriate prior on the scale parame- 



ter that penalizes the approach to zero, and fitting the model using a Bayesian approach. Since 



numerical estimation methods are unreliable under extremely heavy tails (i.e. a < 0.2) (Matsui 



and Takemura 2004 ) we apply flat or triangular priors to the index of stability on the domain 



0.2 < a < 2, and a loose inverse gamma prior on the scale parameter which has Pr{c — !■ 0) — > 0. 



MCMC estimation 

Markov chain Monte Carlo (MCMC) methods are widely used to estimate complex multivari- 
ate probability densities in numerous biological fields. The goal of such methods is to generate 
a sample from a probability distribution by constructing a Markov chain that has the desired 
distribution as its equilibrium density. A common strategy is to utilize a Metropolis-Hastings 



sampler (Metropolis et al. 1953) in which the statistical model is initialized with some set of 
parameter values 9, a new candidate parameter 9' is generated by a symmetrical proposal distri- 
bution, and accepted as the next step of the Markov chain with probability equal to P{9')/P{9). 
We found that the Metropolis-Hastings sampler performed poorly in estimating ancestral states 
and parameters of the stable model due to the multimodality of the local likelihood surface (Fig- 
ure 4) and the non-independence of ancestral state values, which together generate numerous 
very small "islands" of high likelihood which are unlikely to be explored by the Markov chain 
in a reasonable amount of time. Modified versions of the procedure, such as Metropolis-coupled 



Markov chain Monte Carlo (Geyer 1991) did not yield any benefit 



We found that implementation of a slice sampler ( Neal 2003[ ) manifestly improved the mixing 



of Markov chains. At each step of the chain, the value of each individual parameter di is replaced 
by a new value 0^ drawn randomly from the conditional probability distribution P(9'^\6j , 9k, •■•)■ 
This is accomplished in two steps. First, the conditional probability of 9i is calculated and a 
random number y is generated from the uniform distribution between zero and P{9i\9j,9k, ■■■)■ 
Second, 0- is generated as a uniformly distributed random number from the set of ranges for 
which P{6i\9j,9k.- ) > y. Even though the conditional probability distribution of the ancestral 
state of some focal node may be multimodal (Figure 4), the number of modes is known to be 
less than or equal to the number of nodes to which the focal node is connected by branches. 
In addition, the set of ranges of values for which 'P{9i\9j,9k, ■■■) > y can be found by searching 
around the values of those modes, which are near to the values of the current ancestral states 
estimated for the nodes to which the focal node is connected. Critically, this sampling procedure 
(Figure 5) permits large jumps away from suboptimal likelihood peaks even when the sampled 
distribution is multimodal with widely separated modes. 

Hypothesis testing and model selection 

It is desirable to formulate a statistical model selection criterion to determine whether the stable 
model of continuous character evolution (with a < 2) fits the data better than than the Brownian 
motion model (with a = 2), as a means of estimating the best possible ancestral state estimates 
and of testing the hypothesis that a set of character data at the tips of a phylogeny exhibits 
patterns consistent with departure of the evolutionary process from neutrality. Our Markov chain 
Monte Carlo procedure generates a sample from the posterior distribution of stable parameters 
and ancestral states, from which it is difficult to calculate Akaike's Information Criterion (AIC) 



(Akaike 19741 due to its reliance on maximized likelihood. However there exists a number of 



Bayesian generalizations of AIC which may be calculated from posterior MCMC samples, notably 



the Deviance Information Criterion (DIG) ( Spiegelhalter et al. 2002) which is constructed from 



the deviance of posterior samples and the effective number of parameters in the model. For each 
individual sample in the MCMC chain the deviance D(X, a, c; T) is equal to —2 log L(X, a, c; T) 
plus some unknown normalizing constant which cancels out in comparisons across a pair of 
models. The effective number of parameters p£) is measured across the whole set of samples and 
is defined as D — D, where D is the mean deviance and D is the deviance of the mean parameter 
values. The Deviance Information Criterion itself is defined as DIC — I) + p]j = D + 2p]j, with 
lower DIC values indicating better model fit. The effective number of parameters po may be 
intuitively understood as a measure of dispersion of parameter estimates around their mean values 
with respect to likelihood, which increases with model complexity and which acts as a penalty 
on the mean likelihood of the posterior sample as a whole. Our simulation studies indicated 
that in phylogenetic datasets pjj did not increase sufficiently in line with tree size, resulting in 
overfitting of stable models on large trees. A Bayesian Predictive Information Criterion (BPIC) 



developed by Ando ( Ando and Tsay 2010 personal communication), which amounts to DIC with 
a increased multiplier on the pjy penalty, resolved this problem of overfitting. The simulation 
studies are available on the software website (see below). 

Software implementation 

Efficient software for fitting the stable model to phylogenetic trees and their associated data 
was written in C++ and is available at |http://www.sfu.ca/"'| micke/stabletraits.html as source 
code and also compiled for various operating systems. The sofware reports a posterior sample 
of ancestral state reconstructions and stable parameter values in a format compatible with the 



Tracer software application (Rambaut and Drummond 20121, along with the proportional scale 



reduction factor convergence diagnostic (Brooks and Gelman 1998) and Bayesian Predictive 
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Information Criterion for assessment of model fit (Ando and Tsay 20101. Multiple chains are 
run on independent threads, or on independent processors in a cluster computing environment 
(for which a torque job submission script is also available). 



Application to simulated and natural data 

In order to assess the improvement (if any) in quality of ancestral state reconstruction and 
the ability of statistical tests to identify biological characters that have evolved under a stable 
rather than Brownian stochastic process, data were simulated under a variety of conditions. 
Evolutionary increments were generated randomly from a stable model with unit scale and index 
of stability ranging from 1.0 to 2.0 in steps of 0.2, on random phylogenetic trees with 25, 40, 60, 



90, 130, 175, 235, 325, 440 and 600 tips generated under the Yule model in Mesquite (Maddison 



and Maddison 2011). This procedure gave rise to 60 experimental conditions, each consisting of 
250 trees and simulated datasets, and varying in tree size and index of stability of simulated trait 
values. Based on data at the tips, ancestral states were reconstructed using the modal posterior 
density estimate from MCMC, first with a fixed at 2.0 (representing a Brownian motion model) 
and second with a free a (representing a general symmetric stable model) . Ancestral states were 
also esimated using the homogenous Ornstein-Uhlenbeck model for comparison. Estimates of 
Type I and Type II error rates under the BPIC criterion were made for each tree size/stability 
condition. Accuracy was assessed for Brownian, Stable and OU models by calculating variance of 
the inferred ancestral states from the true simulated states for each simulated dataset under each 
condition. We report here the median variance ratio of stable/Brownian and stable/Ornstein- 
Uhlenbeck reconstructions within each experimental condition. 

To provide a demonstration of the model's application to real biological datasets, we made 



use of a supertree of mammalian species ( Bininda-Emonds et al. 2007 2008 1 and fitted the stable 



model of character evolution to data on the log adult body mass of 1,679 eutherian mammals 
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(combined data from Ernest 2003 Pitnick et al. 2006). For purposes of comparison we also 
estimated ancestral states under two homogenous evolutionary models, the Brownian motion 
model and the Ornstcin-Uhlenbcck model, the latter estimates obtained by maximum likelihood 



search from parameter estimates generated by the SLOUCH package in R (Hansen et al. , 2008). 
Finally we also inferred ancestral states using two heterogenous models, the time-heterogenous 



Early Burst model (Blomberg et al. 2003 Harmon et al. 20101 in which the rate of a Brow- 
nian process increases or decreases exponentially in time, and the clade-heterogenous model of 



Eastman et al. (2011) in which the rate of a Brownian process is "inherited" within clades but 



allowed to occasionally shift in value on probabilistically selected branches of a phylogeny. The 
former ancestral state estimates were made by maximum likelihood search based on parameter 



estimates and scaled phylogenies derived from the GEIGER package in R ( Harmon et al. 2008 ) 
while the latter ancestral state estimates were made by Brownian motion maximum likelihood 
reconstruction on a tree with branch lengths scaled by the maximum posterior probability esti- 



mates of rates reported by the Auteur package in R (Eastman et al. 2011 ). Connections between 
these various models and the stable model are discussed below. 



Results 

Our analysis of simulated evolution on random phylogenetic trees indicates that biological traits 
derived from a Brownian process can be statistically distinguished from those derived from a 
non-Brownian symmetrical stable process by on the basis of the Bayesian Posterior Information 
Criterion (Table [l]). This model selection criterion was found to be highly conservative, with a 
low rate of false rejection of the null hypothesis for trees of all sizes, but with the expected low 
power to detect small departures from the neutral Brownian model on small trees. 

In ancestral state reconstruction of traits simulated under Brownian motion the stable model 
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performed as well as the Brownian model (Table 2), with median squared error ratio ranging from 
1.02 to 1.00, and better than the Ornstein-Uhlenbeck model, with median squared error ratio 
ranging from 0.36 on the smallest tree to 0.86 on the largest (Table 3). For trees simulated under 
the stable process with a < 2, the stable model yielded more accurate ancestral state estimates, 
increasingly so for large trees and large deviations from Brownian motion. For the most extreme 
index of stability considered here (a = 1.0) the mean squared error under Brownian motion 
reconstruction was from 4.8 to 100 times higher, and under Ornstein-Uhlenbeck reconstruction 
from 5.9 to 100 times higher, than the mean squared error under stable reconstruction. 

Results of our analysis of a dataset of mammalian body masses are presented in Table 4. The 
maximum posterior probability estimate of the index of stability a was 1.55 and the Brownian 
motion model was soundly rejected in favour of the stable model under the BPIC model selection 
criterion (ABPIC=:465). The Ornstein-Uhlenbeck also fit significantly better than the Brownian 
motion model (AAICc=24) as did the multi-rate model of Eastman et al. ( 2011[ ) (posterior 
probability of no rate changes = 0) . In order to provide the entries in Table 4 for Early Burst we 
inferred maximum likelihood ancestral states under the global parameter estimate reported by 



Cooper and Purvis (2010) in their broader study of mammalian body mass evolution: the Early 



Burst model did not differ significantly from the Brownian motion model on this dataset. 



Discussion 



Reconstructing a historical narrative of trait evolution over time is central to both the formula- 



tion and testing of hypotheses in evolutionary biology ( Coddington 1988 Pagel 1999 Martins 



2000 ) . Comparative phylogenetic methods do so in a formal framework using stochastic models 



of the evolutionary process that implicity or explicity assume some probabilistic distribution of 



ancestral states over internal nodes of a phylogeny ( Pagel and Harvey 1989 Harvey and Pagel 
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1991 Martins 1996 Maddison and Maddison 2011 Pagel et al. 2004). Brownian motion is a 



fundamental stochastic model of evolution which assumes that biological traits evolve by accru- 
ing incremental changes drawn from a random distribution with zero mean and finite constant 
variance. However, most evolutionary hypotheses of interest involve traits thought to be subject 
to selection leading to directional tendencies, relatively rapid grade shifts and convergent evolu- 
tion. The resulting patterns may be at odds with the neutral drift modelled by Brownian motion 



(Cunningham 1999) . Indeed, studies of the performance of ancestral state reconstruction using 



known ancestral states derived from fossil estimates ( Finarelli and Flynn 2006 Donoghue et al. 



1989 Webster and Purvis 2002 Slater et al. 2012) or from taxa evolving sufficiently rapidly 



to be observed in real time ( Oakley and Cunningham 2000 ) indicate that a mismatch between 



the stochastic model and historical reality can result in incorrect estimates (but see also Polly 



2001). 



In this paper we have described a stochastic model of continuous character evolution based 
on a generalization of the Brownian model of evolution that does not assume that the rate 
of evolutionary change is constant and finite. Under these relaxed assumptions, the sum of 
increments accruing to an evolving character along each branch of a phylogeny is known to 
tend toward a stable limit distribution, which is identical to a normal distribution in the special 
case of Brownian motion but otherwise has heavier tails (Figure 1). These heavy tails allow 
rare evolutionary increments of large magnitude to occur, resulting in a volatile evolutionary 
process characterized by occasional "adaptive" evolutionary shifts interspersed with neutral-like 
patterns of variation (Figure 2). Stable parameters and ancestral states can be fit to biological 
data distributed over a phylogenetic tree using Markov chain Monte Carlo methods. We have 



implemented software that makes use of a slice sampler (Neal 2003) to sample the posterior 



probability distribution of ancestral state assignments at each node and the values of stable 
parameters (the index of stability a, which is equal to 2 under Brownian motion, and scale 



14 



c). The slice sampler is able to take advantage of our knowledge of the approximate location 
of modal regions to move across a multimodal likelihood surface without becoming trapped in 
locally but not globally optimal regions (Figure 5). An additional benefit of the slice sampler 
is its adaptive step size, requiring no tuning of proposal distributions, which makes practical 
application of the method straightforward. Our analysis of simulated data indicates that the 
Bayesian Predictive Information Criterion (BPIC) provides a conservative test of the hypothesis 
of departures from neutrality (i.e., the existence of heavy tails) in an evolutionary process (Table 
[T]), and that the stable model estimates ancestral states with reduced error in comparison with 
the Brownian motion model, when traits evolve by accumulating increments from a probability 



We found the stable model to fit the eutherian body-size data significantly better than the 
Brownian motion model (Table 4), suggesting the existence of departures from neutrality. Under 
the stable model, the ancestral eutherian is relatively small; in line with fossil evidence (also 
presented in Table 4), low body mass persists through early diversification of the Superorders 
Afrotheria, Euarchontoglires and Laurasiathera and the origin of various orders of small size such 
as Primates, Rodentia, Lagomorpha, Scandentia, Afrosoricida and Macroscelidea; large reduc- 
tions in body size are rare, occuring in Chiroptera, while large increases in body size occur with 
the origin of several modern Orders of relatively large species including the ungulates (Perisso- 
dactyla + Artiodactyla), Carnivora, Cetartiodactyla, Proboscidea and Sirenia. The Brownian 
motion model differs in several respects. First the Brownian motion reconstruction of the an- 
cestral eutherian's body mass is an order of magnitude greater than the stable reconstruction; 
the Brownian motion reconstruction thus posits significant reductions of body size prior to the 
evolution of orders of small body size including Rodentia, Scandentia, Chiroptera, Eulipotyphla 
and Macroscelidea. As expected, the Brownian motion model exhibits an "averaging effect" 
more generally, in which transformations in body mass are distributed somewhat evenly over 
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the phylogeny, while the stable model permits large transformations in body mass to occur on 
a smaller subset of branches. For this reason, the ancestral state reconstruction of taxa an- 
cestral to typically large species (i.e., Cetartiodactyla, Perissodactyla, Proboscidea) are smaller 
under the Brownian motion model, and the ancestral state reconstruction of taxa ancestral to 
typically small species (i.e., Rodentia, Lagomorpha, Chiroptera, Eulipotyphla, Afrosoricida, and 
Macroscelidea) are larger under the Brownian motion model. The tendency for ancestral state 
reconstructions to be weighted toward intermediate values is stronger under the Brownian mo- 
tion model than under the stable model since the former model vitiates against the inference of 
directional tendency. 

A desire to model the adaptive evolution of continuous traits has given rise to a number 
of approaches that refine or extend the Brownian model. We categorize these as homogenous 
approaches, in which the stochastic process underlying the generation of evolutionary increments 
to an evolving character does not vary across branches of a phylogenetic tree, versus heteroge- 
nous processes, in which the stochastic process varies across branches. One popular homogenous 



model is the global Ornstein-Uhlenbeck model of trait evolution (Hansen 1997j), in which the 
direction and rate of evolution at any time depends upon a selection coefficient and the degree of 
deviation of the trait's current value from some global optimum or "phylogenetic mean" . This so- 
called "mean-reverting" process has been used as a model of stabilizing selection since deviation 
away from the phylogenetic mean is penalized under maximum likelihood reconstruction. Our 
simulation studies indicate that the stable model estimates ancestral states with reduced error in 
comparison to the homogenous Ornstein-Uhlenbeck model, when traits evolve by accumulating 
increments from a probability distribution without fixed variance (Table |3]). The homogenous 
Ornstein-Uhlenbeck reconstruction of mammalian body mass (Table 4) is in some respects inter- 
mediate between the Brownian motion and stable reconstructions, with relatively small ancestors 
of Orders with small body size and relatively large ancestors of Orders with large body size. This 
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can be interpreted in terms of the stabilizing selection model with an intermediate phylogenetic 
mean: the proposal of a small ancestral rodent (for example) permits a tendency to evolve back 
toward the mean within the rodent clade, while the proposal of a large ancestral carnivore (for 
example) permits the same tendency in the opposite direction within the carnivore clade. This 
pattern is most striking in the ancestral state inferred for Afrotheria (8.95kg versus 1.73kg under 
Brownian motion and 0.36kg under the stable model), where a large ancestor reduces the rate 
of evolution on branches leading ultimately to the large elephants and manatees while permit- 
ting many high-likelihood reductions of size toward the phylogenetic mean in small taxa such as 
Afrosoricida and Macroscelidea. This reconstruction for Afrotheria seems unlikely and may lead 
us to suppose that a single phylogenetic mean for the entire Eutheria does not form a realistic 
model of stabilizing selection. 



Cooper and Purvis (20101 report success in fitting more complex heterogenous models to a 



larger set of mammalian body mass data. The Ornstein-Uhlenbeck model is easily extended to 



the heterogenous case by permitting more than one clade-specific phylogenetic means (Hansen 



1997 Butler and King 2004 Hansen et al. 2008 Beaulieu et al. 2012 1, the number and phy- 



logenetic position of such means being specified a priori or estimated from the data. Various 
transformations of the phylogeny, such as raising all branch lengths to a constant power in order 



to approximate speciational change (Pagel 1999) or somewhat ad hoc transformation of branch 



lengths to maximize the likelihood of a Brownian model ( Grafen 1989 Gittleman and Kot , 1990 



Garland et al. 1992 ) have also been used to generate implicitly heterogenous models. The Early 



Burst model (Blomberg et al. 2003 Harmon et al. 2010), also applied by Cooper and Purvis 



(2010), is interesting in generating rate heterogeneity by allowing the rate of a Brownian motion 



to process to vary over time, rather than across branches or clades. The rate of evolution is 
taken to be an exponentially increasing or decreasing function of node height, allowing a greater 
proportion of evolutionary change to occur early in the phylogeny or late in the phylogeny de- 
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pending on the choice of an exponential scaUng factor r. We apphed the Early Burst model to 
our mammalian body mass dataset but did not identify a significant deviation from r = 0; ances- 
tral state estimates in Table 4 are derived from a reconstruction based on r = —0.009 estimated 



by Cooper and Purvis (2010). The concentration of more evolutionary change in basal branches 
of the phylogeny appears to allow rapid early deviation from the phylogenetic mean value, with 
the majority of ancestral taxa exhibiting marginally smaller body sizes, often more consistent 
with the fossil evidence, than under the Brownian motion reconstruction, yet surprisingly with 
considerably less ordinal-level diversification than imputed by the stable model which does not 
build early diversification in to the stochastic process itself. 

Homogenous approaches offer a number of advantages over heterogenous approaches in that 
the latter category must not only infer the parameters of the stochastic process but also must infer 
the structure of rate heterogeneity over the phylogeny. Especially when heterogenity is associated 
with clades, overfitting heterogenous models by positing too many rate shifts or clade-spccific 



evolutionary regimes may become a danger. Eastman et al. (2011 1 have proposed a heterogenous 



model of trait evolution that explicitly avoids overfitting by sampling over model parameter values 
and the number of model parameters simultaneously. The model is an extension of the standard 
Brownian motion model in which the rate of evolution is "inherited" over time but may undergo 
occasional shifts in value. Each shift introduces a new parameter that is penalized in a reversible 
jump Markov chain Monte Carlo algorithm. We found that the complexity of the reversible 
jump algorithm considerably increases the computational burden of fitting the model: for the 
body mass data considered here, the stable slice sampler accomplished around 70,000 steps per 
minute on a modest dual-core home laptop, versus around 2,500 per minute for the multi-rate 
model, without the need for a lengthy calibration of proposal densities beforehand. Ancestral 
states reconstructed under the Eastman model (Table 4) were in broad agreement with other 
non-Brownian methods presented here in imputing smaller early mammals, and were in close 
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agreement with results of the Early Burst analysis. 

In general, the stable model suggests a greater degree of ordinal-level diversification of mam- 
malian body masses, and appears to accommodate a more volatile evolutionary process, than 
any of the other models considered here. For 13 of the 22 nodes listed in Table 4 the stable 
model reconstructs the smallest body masses of any model, and for 4 nodes it reconstructs the 
largest body mass of any model, making the stable reconstruction consistent with the hypoth- 
esis of small early mammals and occasional marked ordinal-level enlargement. The ability of 
the stable model to accommodate striking variation in evolutionary rate, even more so than 



approaches such as that of Eastman et al. (2011 1 explicitly designed to model such variation, is 



most apparent in the highly diverse and species-poor Afrotheria, where the stable reconstruction 
involves the largest ancestral elephants and manatees yet the smallest Afroinsectivores of any 
of the methods considered here. While the heterogenous Ornstein-Uhlenbeck model binds rate 
volatility to the structure of the phylogeny through the assumption of clade-specific phylogenetic 
means, and the Eastman et al. RJ-MCMC model binds rate volatility to the structure of the 
phylogeny through the "inheritance" of rate shifts from ancestral to descendant branches, the 
stable model through its homogenously heavy tails provides unstructured volatility that is able 
to concentrate the production of evolutionary variation onto relatively few branches scattered 
across the phylogeny. 

The stable model is the simplest non-Brownian model considered here, requiring only a single 
parameter in addition to the standard Brownian motion model. The relative efficiency of the 
estimation procedure used to fit the stable model may make it attractive for analysis of very 
large trees or large sets of trees derived from Bayesian phylogenetics. Furthermore, deviation 
from the Brownian model according to the BPIC criterion may be used to provide independent 
statistical support for the adoption of one of the more complex heterogenous models currently 
available. Rates imputed by the stable model may guide appropriate selection of branches for 
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independent rates in such cases. In the mammal body mass data examined here, for example, the 
frequency distribution of standardized trait changes along branches of the phylogeny (reported 
by the accompanying software) indicates accelerated evolution at the origin of a number of 
clades including Hyomys (white-eared giant rats), Tragulidac (mouse deer), Manidae (pangolins), 
Megachiroptera (megabats), Megadermatidae (false vampire bats), Solenodontidae (solenodons) , 
Orycteropodidae (aardvark) and Hyracoidea (hyraxes), suggesting that these clades may merit 
their own phylogenetic mean values under a heterogenous Ornstein-Uhlenbeck approach. In 
order to determine whether such a model is useful in any particular case it is necessary, as with 
any stochastic model of evolution, to rigorously constrain the model empirically, and while the 
results presented here are primarily illustrative and to provide comparison across unconstrained 
models, we note that the low ancestral state inferences for extinct taxa at the root of Rodentia, 
Lagomorpha, Primates, Chiroptera and Lipotyphla, and the high ancestral state inferences for 
taxa at the root of Sirenia and Elephantidae, appear broadly in line with fossil evidence. 

While our homogenous approach may be associated with some advantages with respect to 
efficiency, simplicity and unstructured volatility, the heterogenous models such as Early Burst 
have the benefit of imposing an explicit evolutionary narrative on the process of trait diversifica- 
tion which may be useful for exploring and testing general hypotheses about historical processes 



(Harmon et al. 2010). Heterogenous models typically involve the elaboration of a simple Gaus- 
sian kernel to accommodate phylogeny- or time-structured variation in the evolutionary process. 
We suggest that in future work heterogenous stable models analogous to those considered above 
may be readily generated by directly replacing the Gaussian kernel with the more general stable 
kernel, at the expense of a single parameter. Stable Ornstein-Uhlenbeck processes, for example. 



are already well-characterized ( Samorodnitsky and Taqqu 1994 ) . The stable model we introduce 



to phylogenetic evolutionary biology here may find other uses, for example in assigning substitu- 



tion rates to edges on phylogenetic trees under relaxed clock models ( Thorne et al. 1998 ) . One 
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primary obstacle to the replacement of Gaussian kernels by stable kernels in models of continu- 



ous character evolution is that that stable distributions have undefined variance ( Samorodnitsky 



and Taqqu 1994 1. Methods making direct use of variance are typically used to detect correlated 



evolution between multiple continuous characters evolving on the same phylogenetic tree (Mar- 



tins 



1996). Independent contrasts (Felsenstein 1985) for example, generates standardized data 



points for each univariate character by scaling the increments accruing along paired branches 
of a phylogeny by the square root of the sum of branch lengths, which is proportional to the 



expected standard deviation of a Brownian process. Methods of phylogenetic regression (Grafen 



1989 Garland et al. 1993 Martins and Hansen 1997) extend least squares methods to multi- 



variate phylogenetic data by incorporating branch length and topological information into the 
model's covariance matrix. The fact that stable variance is undefined means that there is no 
stable equivalent to standard deviation or the covariance matrix. We note that regression and 
correlation models based on stochastic processes driven by non-Gaussian stable perturbations 



have been implemented successfully in non-phylogcnetic fields (i.e., McGulloch 1998a Frain 



2008 Paulaauskas and Rachev 2003 ) . These approaches raise the prospect that likelihood-based 



analysis of heavy tailed multivariate distributions may offer useful insights into future studies of 
correlated evolution of multiple continuous characters in evolutionary biology, since correlated 
evolution is precisely the kind of problem domain in which the putative Brownian assumptions 
of neutrality and gradualism are likely to be invalid. 
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Error 
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0.11 
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0.04 
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0.00 


0.00 


0.00 


0.00 


0.00 
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0.66 
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0.47 
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0.04 
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0.00 
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0.00 




1.2 


0.09 
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0.00 


0.00 




1.0 


0.02 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 



Table 1: Type I and Type II error rates resulting from stable versus Brownian model selection 
using the BPIC model selection criterion, with data simulated on trees of various sizes under 
stable processes with varying indices of stability. 
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0.36 
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0.14 
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0.08 
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0.21 


0.16 


0.08 


0.07 


0.06 


0.04 


0.03 


0.02 


0.02 


0.01 



Table 2: Sum of squared errors in ancestral state reconstruction, median ratio of stable/Brownian 
reconstruction error, with data simulated on trees of various sizes under stable processes with 
varying indices of stability. 
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Table 3: Sum of squared errors in ancestral state reconstruction, median ratio of stable/Ornstein- 
Uhlenbeck reconstruction error, with data simulated on trees of various sizes under stable pro- 
cesses with varying indices of stability. 
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Figure 1: Log probability densities of stable distributions with varying a, including the normal 
distribution (a=2) and the Cauchy distribution (a = 1) as special cases. 
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Figure 2: Stable random walks with varying a, driven by accumulation of a single sample of 500 
increments drawn from a uniform distribution between zero and one, transformed onto stable 
distributions using the inversion method ( Devroye 1986 ) . 
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Figure 3: Ancestral state reconstruction of data simulated under Brownian motion. Top row: 
phylogenies exhibiting ancestral state reconstructions; each tip is labelled with the known char- 
acter state, while at each internal node, upper values represent the Brownian motion maximum 
likelihood reconstruction and lower values represent the reconstruction under the stable Markov 
chain Monte Carlo model to be described in this paper (left: original simulated data; right: the 
character value of a single tip has been multiplied by ten to simulate rapid evolution (or mea- 
surement error) on a single branch). Bottom row: the marginal probability density derived from 
MCMC for the ancestral state assignment of the root node using unmodified (left) and modified 
(right) data; the Brownian motion marginal probability density is grey and the stable margin 
probability density is black. 
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Figure 4: (A) A focal node in a phylogenetic tree has two children with trait values 2 and 3, and a 
parent with trait value 0, to which it is connected by branches of unit length. (B) The potentially 
multimodal conditional probability density function describing the probability with which the 
focal node has some trait value given the values of its children and parent (several curves differ 
in terms of the scale parameter of the stable distribution, but have a common a = 1.5). 
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Figure 5: An example of slice sampling updating the value of the focal node from Figure 4(A) 
from 9i to 0[. First, the conditional probability that the value of the evolving character at the 
focal node is equal to 9i — given the other ancestral states and stable model parameters — is 
calculated. A random number y is drawn from the uniform distribution between zero and this 
conditional probability (marked with a dotted line). The set of possible values of 9[ is estimated 
by bracketing regions around the modes of the distribution for which the conditional probability 
is greater than y (solid lines and shaded region of the distribution). The new value 6[ is drawn 
as a uniform random variable from this set. 
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